Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS

نویسندگان

  • Zhihui Luo
  • Robert Duffy
  • Stephen Johnson
  • Chunhua Weng
چکیده

We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

OBJECTIVE To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. DESIGN The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hie...

متن کامل

Identifying Most Relevant Concepts to Describe Clinical Trial Eligibility Criteria

Since eligibility criteria of clinical trials are represented as free text, their automatic interpretation and the evaluation of patient eligibility is challenging. Our approach to the criteria processing is based on the identification of contextual patterns and semantic concepts that together define the machine-interpretable meaning. The goal of this research is to find the most relevant conce...

متن کامل

ECRL: an eligibility criteria representation language based on the UMLS Semantic Network.

We propose a formal representation language to represent, share and reuse eligibility criteria in clinical research protocols towards the goal of automated eligibility identification. The language is an extension over the UMLS Semantic Network and can be transformed into other computable representations.

متن کامل

Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis

OBJECTIVE To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources. DESIGN Based on the occurrences of UMLS terms in a 51 million document corpus of Mayo Clinic clinical notes, this study computes statistics about the terms' str...

متن کامل

Towards a Semantic Lexicon for Biological Language Processing

This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2010  شماره 

صفحات  -

تاریخ انتشار 2010